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Abstract 

Online social systems are multiplex in nature as multiple links 
may exist between the same two users across different social 
networks. In this work, we introduce a framework for study¬ 
ing links and interactions between users beyond the individ¬ 
ual social network. Exploring the cross-section of two pop¬ 
ular online platforms - Twitter and location-based social net¬ 
work Foursquare - we represent the two together as a compos¬ 
ite multilayer online social network. Through this paradigm 
we study the interactions of pairs of users differentiating be¬ 
tween those with links on one or both networks. We find that 
users with multiplex links, who are connected on both net¬ 
works, interact more and have greater neighbourhood over¬ 
lap on both platforms, in comparison with pairs who are con¬ 
nected on just one of the social networks. In particular, the 
most frequented locations of users are considerably closer, 
and similarity is considerably greater among multiplex links. 
We present a number of structural and interaction features, 
such as the multilayer Adamic/Adar coefficient, which are 
based on the extension of the concept of the node neigh¬ 
bourhood beyond the single network. Our evaluation, which 
aims to shed light on the implications of multiplexity for the 
link generation process, shows that multilayer features, con¬ 
structed from properties across social networks, perform bet¬ 
ter than their single network counterparts in predicting links 
across networks. We propose that combining information 
from multiple networks in a multilayer configuration can pro¬ 
vide new insights into user interactions on online social net¬ 
works, and can significantly improve link prediction overall 
with valuable applications to social bootstrapping and friend 
recommendations. 


Introduction 

Online social media has become an ecosystem of over¬ 
lapping and complementary social networking services, 
inherently multiplex in nature, as multiple links may 
exist between the same pair of users (Kivela et al. 
20141. Multiplexity is a well studied property in the so¬ 
cial sciences ( |Haythomthwaite and Wellman 199^ and 
it has been explored in social networks from Renais 


sance Fl orence (| Padgett and Mclean 2006 j to the Inter¬ 
net age ( |Haythornthwaite 2005) 1. Despite the broad con¬ 
textual differences, multi-relational ties are consistently 
found to exhibit greater intensity of interactions across dif¬ 
ferent communication channels, and therefore a stronger 


bond (| Haythornthwaite and Wellman 1998 Hristova, Mu- 
[solesi, and Mascolo 2014) 1. Nevertheless, there is a lack of 
research about online social networks and their value from 
a multiplex perspective. 

Recently, empirical models of multilayer networks have 
emerged to address the multi-relational nature of social net¬ 
works ( |Kivela et al. 2014) )Szell, Lambiotte, and Thumer) 
2010|l. In such models, interactions are considered as lay¬ 


ers in a systemic view of the social network. We adopt 
such a model in our analysis, where we shift the concept 
of a link and neighbourhood to encompass more than one 
network. This allows us to study interactions and struc¬ 
tural properties across online social networks (OSNs), ad¬ 
dressing the need for further understanding of their com¬ 
plimentary and overlapping nature, and multiplexity online. 
Although there have been some recent comparative stud¬ 
ies of multiple online social networks (jOttoni et al. 2014 


Lee et al. 2014|l, and their intersection ( |Szell, Lambiotte, 


and Thurner 2010|l, the applications of multiplex network 


properties to OSNs is yet to be substantially addressed. 

In this work, we explore intersecting networks, multi¬ 
plex ties, and their application to link prediction across 
OSNs. Link prediction systems are key components of so¬ 
cial networking services due to their practical applicability 
to friend recommendations and social network bootstrap¬ 
ping, as well as to understanding the link generation pro¬ 
cess. Link prediction is a well-studied problem, explored 
in the context of both OSNs and location-based social net¬ 
works (LBSNs) (Liben-Nowell and Kleinberg 2007) Menon 
and Elkan 2011) [Crandall et al. 2010nScellato, Non as, and 


Mascolo 20111. However, only very few link prediction 
works tackle multiple networks at a time ( Lee et al. 2014) 
Tang, Lou, and Kleinberg 2012 1, while most link prediction 

systems only employ features internal to the network under 
prediction, without considering additional link information 
from other OSNs. 

Our main contributions can be summarised as follows: 


• We generalise the notion of a multilayer online social 

network, and extend definitions of neighbourhood to span 
multiple networks, adapting measures of overlap such 
as the Adamic/Adar coefficient in social networks to the 
multilayer context. 




















































• We find that pairs with links on both Twitter and 
Foursquare exhibit significantly higher interaction on 
both social networks in terms of number of mentions and 
colocation within the same venues, as well as a lower 
distance and higher number of common hashtags in their 
tweets. 

• A significantly higher overlap can be observed between 
the neighbourhoods of nodes with links on both networks, 
in particular with relation to the Adamic/Adar measure of 
neighbourhood overlap, which is significantly more ex¬ 
pressed in the multilayer neighbourhood. 

• In our evaluation, we predict Twitter links from 
Foursquare features and vice versa, and we achieve this 
with AUC scores up to 0.86 on the different datasets. In 
predicting links which span both networks, we achieve 
the highest AUC score of 0.88 from our multilayer fea¬ 
tures set, proving the multilayer construct a useful tool 
for social bootstrapping and friend recommendations. 

The remainder of this work details these contributions, 
and summarises related work, concluding with a discussion 
of the implications, limitations, and applications of the pro¬ 
posed framework. 


Related Work 

Our work identifies with three main areas; multi-relational 
social networks, media multiplexity, and link prediction in 
online social networks. We summarise the state of the art in 
these areas in the following sections. 


Multilayer Social Networks 

Multi-relational or multilayer networks have been explored 
in the context of a wide range of systems from global air 
transportation ( Cardillo et al. 20f3]l to massive online mul- 
tiplayer games (Szell, Lambiotte, and Thurner 2010 1 . A 
comprehensive review of multilayer network models can be 
found in ( [Kivela et al. 2014] l. In the context of social net¬ 
works, it is generally accepted that the more information 
we can obtain about the relationship between people, the 
more insight we can gain. A recent large-scale study on 
the subject has demonstrated the need for multi-channel data 
when comprehensively studying social networks ([StopczyrT 


ski et al. 2014|l. Despite the observable multilayer na¬ 


ture of the composite OSNs of users ( Kivela et al. 2014} 
[Kazienko et al. 2010[ [Brodka and Kazienko 2012| l, most 
research efforts have been focused on theoretical mod¬ 
elling (Kivela et al. 2014 1 , with little to no empirical work 
exploiting data-driven applications in the domain of multi¬ 
layer OSNs, especially with respect to how location-based 
and social interactions are coupled in the online social space. 
We attempt to fill these gaps in the present work by pre¬ 
senting a generalisable online multilayer framework applied 
to classic problems such as link prediction in OSNs. Our 
framework is strongly motivated by the theory of media mul¬ 
tiplexity, which we review next. 


Media Multiplexity 

Media multiplexity ( |Haythornthwaite 2005| l is the principle 
that tie strength is observed to be greater when the num¬ 


ber of media channels used to communicate between two 
people is greater (higher multiplexity). In ( |Haythornthwaite| 
[and Wellman 1998] ) the authors studied the effects of media 
use on relationships in an academic organisation and found 
that those pairs of participants who utilised more types of 
media (including email and videoconferencing) interacted 
more frequently and therefore had a closer relationship, such 
as friendship. More recently, multiplexity has been stud¬ 
ied in light of multilayer communication networks, where 
the intersection of the layers was found to indicate a strong 
tie, while single-layer links were found to denote a weaker 
relationship ( [Hristova, Musolesi, and Mascolo 2014 1 . The 
strength of social ties is an important consideration in friend 
recommendations and link prediction ( [Gilbert and Kara^ 
hallos 2009]l, and we employ the previously understudied 


multiplex properties of OSNs to such ends in this work. 


Link Prediction 

The problem of link prediction was first introduced in the 
seminal work of Kleinberg et al. (ILiben-Nowell and Klein- 
berg 2007) 1 and since then, has been apphed in various net¬ 
work domains. For instance, in (Scellato, Noulas, and Mas-~ 
colo 20111 the authors exploit place features in location- 


based services to recommend friendships, and in (Backstrom 
and Leskovec 20lT] l a new model based on supervised ran¬ 
dom walks is proposed to predict new links in Facebook. 
Most of these works build on features that are endogenous 
to the system that hosts the social network of users. In our 
evaluation, however, we train and test on heterogeneous net¬ 


works. In a similar spirit, the authors in (Sadilek, Kautz, 
and Bigham 2012 1 show how using both location and social 
information from the same network significantly improves 
link prediction. Our approach differs in that it frames the 
link prediction task in the context of multilayer networks 
and empirically shows the relationship between two differ¬ 
ent systems - Foursquare and Twitter - by mining features 
from both. Before presenting our framework and analysis, 
we will next state the research questions we are interested in 
answering through this work. 


Research Questions 

In light of the related work presented above, our goal is to 
mend the gap between multilayer network models, media 
multiplexity properties, and link prediction systems. More 
specifically, we address the following research questions in 
this work: 


RQl: Flow do structural properties such as degree 
extend into the multilayer neighbourhood? We propose a 
multilayer version of the network neighbourhood, which 
extends it to multiple networks (layers) and observe how 
such structural properties are manifested across Twitter and 
Foursquare. 

RQ2: What are the structural and behavioural differ¬ 
ences between single network and multiplex links? In order 
to understand the value of multiplex links (users connected 
on more than one network), we observe how they compare 















































to single network links in terms of neighbourhood overlap, 
Twitter interaction, similarity and mobility in Foursquare. 

RQ3: Can we use information about links from one 
layer to predict links on the other? Many online social 
systems suffer from a lack of initial user adoption. Although 
many social networks nowadays incorporate the option of 
importing contacts from another pre-existing network and 
copying links, this method does not offer a ranking of users 
by relevance targeted towards the specific platform. 

RQ4: Can we predict links which exist on more than 
one network (i.e., multiplex links)? Media multiplexity 
is a valuable source of tie strength information, and has 
further structural implications, which are of interest to OSN 
services and link prediction systems. We would like to 
explore the potential of identifying such links for building 
more successful online communities. 

We will next present our multilayer framework for 
OSNs, and study user behaviour and properties across 
Twitter and Foursquare, extending our analysis to multiplex 
links in comparison with single-layer links. We finally 
integrate this into a link prediction system for OSNs, where 
we evaluate the utility of the metrics and features described 
in this work in hope to answer the above posed questions. 

Multi-relational Framework 

The network of human interactions is usually represented by 
a graph G where the nodes in set V represent people and the 
edges E represent interactions. While this representation 
has been immensely helpful for the uncovering of many so¬ 
cial phenomena, it is focused on a single-layer abstraction of 
human relations. In this section, we describe a model, which 
represents the multiplexity of OSNs by supporting multiple 
friendship and interaction links. 

Multilayer Online Social Network 

We represent the parallel interactions between nodes across 
OSNs as a multilayer network Ai, an ensemble of M graphs, 
each corresponding to an OSN. We indicate the a-th layer 
of the multilayer as , E°‘), where V°‘ and E°‘ are 

the sets of vertices and edges of the graph G“. We can 
then denote the sequences of graphs composing the M-layer 
multilayer graph as Ad = ..., G“,..., G^}. The graphs 

are brought together as a multilayer system by the common 
members across layers as illustrated in Figure 

Multilayer social networks are a natural representation of 
media multiplexity, as each layer can depict an OSN. Fig¬ 
ure [T^ illustrates the case at hand, where there are two OSN 
platforms represented by G“ and G^. Members need not be 
present at all layers and the multilayer network is not lim¬ 
ited to two layers. While each platform can be explored sep¬ 
arately as a network in its own right, this does not capture 
the dimensionality of online social life, which spans across 
multiple OSNs. 

Figure illustrates three link types as observed in Fig¬ 
ure [^for the case of a two layer network. Firstly, we define 



(a) Multilayer Social Network 


(b) Link types 


Figure 1: Multilayer model of OSNs with I. Multiplex link; 
II. Single-layer link on G“; and III. Single-layer link on G^. 


a multiplex link between two nodes i and j as a link that ex¬ 
ists between them at least in two layers a,P & Ad. Second, 
we say that a single-layer link between two nodes i and j 
exists if the link appears only in one layer in the multilayer 
social network. In systems with more layers, multiplexity 
can take on a value depending on how many layers the link 
is present on. In the case at hand, given layer a and layer 
/3, we denote the set of all links present in the multilayer 
network as which yields the global connectivity. 

We also define the set of multiplex links as E°‘'^^ and the 
set of all single-layer links on layer a only as E°^^. These 
multilayer edge sets can be further extended to the M layer 
network by considering more layers {1,..., M} as part of 
the intersection or union of graphs. The presence of mul¬ 
tiplex and single-layer links in the above edge sets defines 
the multilayer neighbourhood of nodes in the network, as 
expanded upon next. 

The Multilayer Neighbourhood 

Following our definition of a multilayer online social net¬ 
work, we can redefine the ego network of a node as the mul¬ 
tilayer neighbourhood. While the simple node neighbour¬ 
hood is the collection of nodes one hop away from the ego, 
the multilayer global neghbourhood (denoted by GN) of a 
node i can be derived by the total number of unique neigh¬ 
bours across layers: 

Tom = {j 6 1/^ : G,, € (1) 

and their global multilayer degree as: 

koNi = IrCAfil (2) 

which provides insight into the entire connectivity of 
nodes across layers, and can therefore be interpreted as a 
global measure of the immediate degree of a node. We can 
similarly define the core neighbourhood (denoted by GN) 
of a node i across layers of the multilayer network as: 

rctv* = {j e : e,,, € (3) 

and their core multilayer degree as: 














(4) 


k a 


kcNz = |rcAfz| 


where we only consider neighbours which exist across all 
layers. This simple formulation allows for powerful exten¬ 
sions of existing metrics of local neighbourhood similarity. 
We can define the overlap (Jaccard similarity) of two users i 
and j’s global neighbourhoods as: 




ircAfi nrcATji 

ircwi urcTVji 


(5) 


where the number of common friends is divided by the 
number of total friends of i and j. The same can be done for 
the core degree of two users. The Jaccard coefficient, often 
used in information retrieval, has also been widely used in 
link prediction ( Liben-Nowell and Kleinberg 2007| . 

We can further extend our definition of the multilayer 
neighbourhood to the Adamic/Adar coefficient for link like¬ 
lihood ( [Adamic and Adar 2001| l, which considers the over¬ 
lap of two neighbourhoods based on the popularity of com¬ 
mon friends (originally through web pages) in a single-layer 
network as: 


aasiTJiQNij = 


E 

Z^VoNi n^GNj 


1 

iog{\TGNz\) 


( 6 ) 


where it is applied to the global common neighbours be¬ 
tween two nodes but can be equally applied to their core 
neighbourhoods. This metric has shown to be successful in 
the link prediction in its original single-layer form in both 


social networks and location-based networks ( 

Liben-Nowell| 

and Kleinberg 2007t [Scellato, Noulas, and IV 

ascolo 201 1|. 


In the present work, we aim to show its applicability to the 
multilayer space in predicting online social links across and 
between Twitter and Foursquare. We will next describe the 
specific datasets, which we apply this framework to. 


Dataset 

Twitter and Foursquare are two of the most popular so¬ 
cial networks, both with respect to research efforts and user 
base. They have distinct broadcasting functionalities - mi¬ 
croblogging and check-ins. While Twitter can reveal a lot 
about user interests. Foursquare check-ins provide a proxy 
for human mobility. In Foursquare users check-in to venues 
that they visit through their location enabled devices, and 
share their visit or opinion of a place with their friends. 
Foursquare is two years younger than Twitter and its broad- 
casting functionality is exclusively for mobile users (SOM 
to date I, while 80% of Twitter’s 284M users are active on 
mobile ^ Twitter generally allows anyone to “follow” and be 
“followed”, where followers and followed do not necessarily 
know one another. On the other hand. Foursquare supports 
undirected links, referred to as “friendship”. A similar undi¬ 
rected relationship can be constructed from Twitter, where a 
link can be considered between two users if they both follow 

’https://foursquare.com/about 

^https://about.twitter.com/company 



Figure 2: Social network graph for San Francisco. Blue 
edges are single-layer edges, while pink edges are multiplex 
edges. The node size is proportional to the global degree of 
that node. 


each other reciprocally (Kwak et al. 20101. Since we are in¬ 
terested in ultimately in predicting friendship, we consider 
only reciprocal Twitter links throughout this work. 

Our dataset was collected from Twitter and Foursquare in 
the United States between May and September 2012, where 
tweets and check-ins were downloaded for users who had 
checked-in during that time, and where those check-ins were 
shared on Twitter. This allows us to study the intersection of 
the two networks through a subset of users who have ac¬ 
counts and are active on both Twitter and Foursquare, and 
have chosen to share their check-ins to Twitter. 


Property 

New York 

Chicago 

SF 

All 


6,401 

2,883 

1,705 

10,989 

|£;7’nF| 

9,101 

5,486 

1,517 

16,104 


13,623 

7,949 

1,776 

23,348 


6,394 

4,202 

863 

11,459 

< koN > 

4.55 

6.12 

2.44 

4.63 

< kcN > 

1.42 

1.9 

0.89 

1.47 

tweets 

2,509,802 

1,288,865 

632,780 

4,431,447 

checkins 

228,422 

105,250 

46,823 

380,495 

venues 

24,110 

11,773 

6,934 

42,817 


Table 1: Dataset properties: number of users (nodes); 
number of multiplex links (edges); number of Twitter and 
Foursquare only edges; average global and core degrees; ac¬ 
tivity and venues per city. 


We focus our analysis on the top three cities in terms of 
activity during the period. Table shows the details for 
each city, in terms of activity and venues, multilayer edges 
and degrees for each network, where denotes the 

set of edges, which exist on both Twitter and Foursquare, 
and E^^'^ are the sets of edges on Twitter only and 
Foursquare only respectively. 

Figure additionally illustrates the case of San Fran¬ 
cisco, where blue edges represent single-layer links on ei- 
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Figure 3: Multilayer degrees of users in comparison to each other and to activity volume on both networks. 


ther Foursquare or Twitter, and pink edges represent multi¬ 
plex links on both. We use a Fruchterman Reingold graph 
layout (jFruchterman and Reingold 19911 to show the core¬ 
periphery structure of the network, with larger nodes having 
a higher global degree koN- In the following section, we 
discuss the implications of these sets in detail, where we 
consider all three cities together, and later evaluate each one 
separately. 


Multilayer Analysis 

We begin our analysis by exploring the intersection between 
the Twitter and Foursquare social networks. We observe 
user the degree properties across the two networks at a larger 
scale for all three cities, while later we perform our evalua¬ 
tion on each city separately. 


RQl: Multilayer Degrees 

We introduced two degree metrics based on the multilayer 
neighbourhood of a node in Equations 2 and 4, where the 
global neighbourhood is equivalent to the union of neigh¬ 
bours on both networks, and the core neighbourhood is 
equivalent to the intersection of neighbours across both net¬ 
works. In this section we consider how the degrees relate to 
user activity and each other. 

In both cases (Figures [^and[Tb]l, users with high activity 
on both networks, and in particular with high Twitter ac¬ 
tivity, have the highest degrees in both the core and global 
neighbourhoods. When we compare the two in Figure 3d 
we observe that their joint distribution follows the long-tai 
exhibited in single-layer social networks as well. Further, 
we observe the multiplex overlap ratio of the core to global 
neighbourhood degrees in FigureThis is simply the core 
over the global degree: 


moTi 


kcNi 

kcNi 


(7) 


which indicates the percent of multiplex links in Fs multi¬ 
layer neighbourhood. High activity nodes across both layers 
at the centre of Figurej^have the highest overlap. 

In Figure]^ we compare the two multilayer degrees. We 
note that the majority of users have a low degree in both. 


and there is a relationship between the two. The core de¬ 
gree is bound by the global degree and is always a fraction 
of it, while the global degree may never exceed the sum of 
the individual layer degrees. This relationship is apparent 
in the figure, where the highest degree users are those who 
have a large number of links which overlap (multiplex links). 
This can be due to the fact that these users are more engaged 
across the two platforms. We further explore the value of 
link multiplexity in the following section. 


RQ2: Link Multiplexity 

We study the three types of links as described in our mul¬ 
tilayer model above: multiplex links on both Twitter and 
Foursquare, which we denote as tf for simplicity; single¬ 
layer links on Foursquare only (denoted as /o); single-layer 
links on Twitter only (denoted as to), and compare these to 
unconnected pairs of users (denoted as na). We consider re¬ 
ciprocal Twitter links only, where eij,eji € Reciprocal 
relationships in Twitter have been considered as equivalent 
to undirected ones in other OSNs (|Kwak et al. 2010]l. 


Multiplexity and Neighbourhoods 

The number of common friends has been shown to be an im¬ 


portant indicator of a link in social networks (Liben-Nowell 


land Kleinberg 20()7| ). Moreover, the neighbourhood overlap 
weighted on the popularity of common links between two 
users has been shown to be a good predictor of friendship in 
online networks ( [Adamic and Adar 200 l| l. Figure shows 
the Adamic/Adar metric of neighbourhood similarity across 
the various single and multilayer neighbourhoods described 
in Section 3, and the four link types. 

The Adamic/Adar metric is distinctly higher for mul¬ 
tiplex links. In agreement with previous studies of tie 
strength (Gilbert and Karahalios 2009 1 , we observe that mul¬ 
tiplex links share a greater overlap in all single and multi¬ 
layer neighbourhoods. In single-layer neighbourhoods (Fig¬ 
ure 1^ and |4b| we observe that after multiplex links, those 
links internal to the network under consideration have a 
higher overlap than exogenous ones (to in Figure 4a] and fo 
in Figure |4b| , followed by unconnected pairs, which have 
the least overlap. 

With respect to the multilayer neighbourhoods, we can 
observe a much more pronounced overlap across the link 
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(a) Twitter Adamic/Adar overlap (b) Foursq. Adamic/Adar overlap (c) VcNij Adamic/Adar overlap (d) FcNij Adamic/Adar overlap 
Figure 4: CCDF function of the log Adamic/Adar metric for the different neighbourhoods between the four link types. 


types. While the global neighbourhood overlap follows a 
similar distribution to the single-layer neighbourhoods but 
at a much lower scale, in Figure we can observe more 
clearly that unconnected pairs share little if any neigh¬ 
bours, while multiplex links have a significant overlap. 


With respect to the global neighbourhood (Figure 4c I, both 
Foursquare only and Twitter only links share sigmhcantly 
more overlap (scale is higher on x axis) than when observ¬ 
ing the single-layer neighbourhoods in Figures 4a and 
This indicates that some common neighbours lie across lay¬ 
ers, and not just within, the global neighbourhood revealing 
a more complete image of connectivity, which stretches be¬ 
yond the single network. 

The core neighbourhood overlap is most prominent for 
multiplex links (Figure |4d|), which indicates that they share 
more friends across networks than any other type of link. 
While this is expected, it confirms that the neighbourhood 
overlap is a good indicator of multiplexity in ties, and is 
particularly strengthened in its weighted form through the 
Adamic/Adar metric of neighbourhood similarity. 

Multiplexity and Interaction 

The volume of interactions between users is often used as 
a measure of tie strength (Onnela et al. 2007]l. In this sec¬ 


tion we compare how the volume of interactions reflects on 
multiplex and single-layer links. We consider the following 
interactions on Twitter and Foursquare: 

Number of mentions: This interaction feature simply mea¬ 
sures the number of times user i has mentioned user j on 
Twitter during the period. Any user on Twitter can mention 
any other user and need not have a directed or undirected 
link to the user he is mentioning. 

Number of common hashtags: Similarity between users 
on Twitter can be captured through common interests. Top¬ 
ics are commonly expressed on Twitter with hashtags using 
the # symbol. Similar individuals have been shown to have 
a greater likelihood of forming a tie through the principles 


of homophily (McPherson, Smith-Lovin, and Cook 2001 


Number of colocations: The number of times two users 
have checked-in to the same venue within a given time win¬ 
dow. In order to reduce false positives, we consider a shorter 
time window of 1 hour only. Two users at the same place, 
at the same time on multiple occasions, increases the like¬ 


lihood of them knowing each other (and having a link on 
social media). We weight each colocation on the popularity 
of a place in terms of the total user visits, to reduce the prob¬ 
ability that colocation is by chance at a large hub venue such 
an airport or train station. 

Distance: Human mobility and distance play an important 
role in the formation of links, both online and offline, and 
have been shown to be highly indicative of social ties and 
useful for link prediction ( |Wang et al. 2011 1 . We calcu¬ 
late the distance between the geographic coordinates of two 
users’ most frequent check-in locations as the Haversine dis¬ 
tance, the most common measure of great-circle spherical 
distance: 


distij = haversine{lati,loni,latj,lonj) (8) 

where the coordinate pairs for i,j are of the places where 
those users have checked-in most frequently, equivalent to 
the mode in the multiset of venues where they have checked- 
in. We only consider users who have more than two check¬ 
ins over the whole period, and resolve ties by picking an ar¬ 
bitrary venue location from the top ranked venues of a user. 

In Figures]^ to 1^ we observe four types of geographic 
and social interaction on the two social networking services, 
where each box-and-whiskers plot represents an interaction 
between multiplex links (f/), Twitter only (to), Foursquare 
only (fo), and unconnected pairs (na) on the x axis. On the y 
axis we can observe the distribution in four quartiles, repre¬ 
senting 25% of values each. The dark line in the middle of 
the box represents the median of the distribution, while the 
dots are the outliers. The “whiskers” represent the top and 
bottom quartiles, while the boxes are the middle quartiles of 
the distribution. 

In terms of Twitter mentions (Figure [^, multiplex ties 
and non-connected pairs of users exhibit an overall greater 
number of mentions than any other group, including the 
Twitter only group. It is uncommon that pairs connected 
on Foursquare only mention each other. Mentions are quite 
common between users who are not connected on any net¬ 
work, which may be as a result of mentioning celebrities and 
other commercial accounts. This is not the case for hashtags, 
where we And that almost all of unconnected users share 10 
or less hashtags with the exception of outliers. Hashtags dis¬ 
tinguish the link type between users better than mentions. 





















With regards to Foursquare interaction, multiplex ties 
have the highest probability of multiple colocations, with 
Foursquare and Twitter only ties having less, and uncon¬ 
nected pairs more so with the exception of some outliers. 
In terms of distance, Twitter only and unconnected pairs 
are the furthest apart in terms of most frequented location, 
making multiplex and Foursquare links more distinguish¬ 
able through this feature, as those pairs have less distance 
between their most frequented locations. 

Although there is certainly greater interaction between 
multiplex links, followed by Twitter only and Foursquare 
only links, we would like to eliminate the randomness in¬ 
troduced by the positive results for unconnected pairs (na). 
We propose two multilayer interaction metrics combining 
heterogenous features from both networks in order to bet¬ 
ter distinguish between the different link types. Firstly, we 
define the global similarity as the Twitter similarity over 
Foursquare distance as: 


simcNij = 


simfj 

dist\^ 
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where sim can be replaced with any type of similarity, 
which is the mass or sum of that similarity for a pair of users, 
and a, b are exponents which can be tuned to optimise the 
features. Figure 5f shows how this feature captures the dif¬ 
ferent levels of links (a=2, b=l). We additionally frame a 
feature which captures the complete interaction across lay¬ 
ers of social networks: 
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where int can be any type of interaction of layer a, this 
can be further refined by giving a weight to each interaction 
but in our case, we consider the coefficient to be equal to 
1 and use colocations from the Foursquare layer and men¬ 
tions from the Twitter layer to express the global interaction 
of two users in the multilayer network. This feature allows 
us to capture the levels of different link types significantly 
better as shown in Figure [5^ 

Although we base our analysis on only two of many pos¬ 
sible communication channels online, we are nonetheless 
able to observe the greater overlap of neighbourhoods and 
higher intensity of interaction characteristic of multiplex 
links, which is in consistency with the theory of media mul- 
tiplexity ( |Haythornthwaite 2005[ ). We evaluate the predic¬ 
tive performance of the union of the features presented in 
the following section. 

Multiplexity & Link Prediction 

In this section we address the link prediction problem across 
layers of social networks, and aim to answer our final two 
research questions: Can we predict one network using infor¬ 
mation from the other?, and Can we predict multiplex links 
in OSNs? We evaluate the likelihood of forming a social 
tie as a process that depends on a union of factors, using the 
Foursquare, Twitter, and multilayer features we have defined 


(e) Colocations and mentions (f) Hashtags over distance 
Figure 5: Interaction features for the different link types. 


up until now in a supervised learning approach, and compar¬ 
ing their predictive power in terms of AUC scores for each 
feature set respectively. 

Prediction Space 

The main motivation for considering multiple social net¬ 
works in a multilayer construct is that each layer carries with 
it additional information about the links between the same 
users, which can potentially enhance the predictive model. 
In light of the multilayer nature of OSNs, we are also inter¬ 
ested in whether we can achieve better prediction by com¬ 
bining features from multiple networks. 

Formally, for two users where are the nodes 

(users) that are present in any layer of the multilayer net¬ 
work, we employ a set of features that output a score r“ so 
that all possible pairs x are ranked according to 
their expectation of having a link e“ on a specific layer a in 
the network. We specify and evaluate two distinct prediction 
tasks. 

Our first goal is to rank pairs of users based on their inter- 






















































































(a) Foursq. link prediction (b) Twitter link prediction (c) Twitter feature set (d) Foursquare feature set (e) Multilayer feature set 
Figure 6: ROC curves for the Random Forest classifier and Area Under the Curve (AUC) scores for each city dataset. 


action on one social network in order to predict a link on the 
other. This entails using mobility interactions to predict so¬ 
cial links on Twitter, and using social interactions on Twitter 
to predict links on Foursquare. Subsequently, we are inter¬ 
ested in predicting the multiplex links at the cross-section 
of the two networks using multilayer features. This type of 
links have both structural and social tie implications as we 
have demonstrated in this work, which makes them desirable 
to identify. 

We perform our evaluation on three datasets described at 
the start of this work in Section 5, where we have Twitter, 
Foursquare, and the derived multilayer features for the cities 
of San Francisco, Chicago, and New York. We adopt a su¬ 
pervised learning approach for the prediction tasks, and for 
each city, which is considered as an independent multilayer 
network, where we train and test on different layers. Su¬ 
pervised learning methodologies have been proposed as a 
better alternative to unsupervised models for link prediction 
( |Lichtenwalter, Fussier, and Chawla 2010] l. 

We compare the performance of feature sets using the 
Random Forest classifier ( |Breiman 200 l| l with 10-fold 
cross-validation testing strategy: for each test we train on 
90% of the data and test on the remaining 10%. For every 
test case the user pairs in the test set were ranked according 
to the scores returned by the classifiers for the positive class 
label (i.e., for an existing link), and subsequently. Area Un¬ 
der the Curve (AUC) scores were calculated by averaging 
the results across all folds. We use AUC scores as a measure 
of performance because it considers all possible thresholds 
of probability in terms of true positive (TP) and false posi¬ 
tive (FP) values rate, which are computed by comparing the 
predicted output against the target labels of the test data. 


In terms of algorithmic implementation, we have used 


public versions of the algorithms available in (Pedregosa 


et al. 20111. The features presented earlier in this work. 


of which each feature set comprises are summarised in 
Table ra We denote the Twitter neighbourhood as F and 
the Foursquare neighbourhood as F^. Next, we specify 
each prediction task and present the results of the supervised 
learning evaluation in terms of the predictive power of each 
feature set in both tasks. 


Twitter features 
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Table 2: Summary of link features. 


RQ3: Cross-network prediction 


The Receiver Operating Characteristic (ROC) curves (de¬ 
fined as the True Positive versus False Positive Rate for vary¬ 
ing decision thresholds) and the corresponding Area Under 
the Curve (AUC) scores are shown in Figure]^ for the three 
datasets. We now discuss these results with respect to each 
task. In the first prediction task, for a pair of users i and j we 
define a feature vector Xy encoding the values of the users’ 
feature scores on layer a in the multilayer network. We also 
specify a target label representing whether 

the user pair is connected on the /? layer under prediction. 

We use the supervised Random Forest classifier (45 trees, 
optimised with tree depth = 25) to predict links from one 
layer using features from the other. Figure 6a shows the 
ROC curves and respective AUC scores for each dataset in 
predicting Foursquare links from Twitter features, ranging 
between 0.7 for the New York dataset to 0.81 for Chicago, 
and 0.73 for San Francisco. We compare this to the reverse 


















































task of predicting Twitter links using Foursquare features in 
Figure where we obtain AUC scores of 0.86, 0.73, and 
0.79 for the three cities respectively. We observe slightly 
higher results for Twitter links, and we note that this may 
be as a result of the higher number of Twitter links in our 
dataset or as a result of the greater difficulty of the inverse 
task. 


RQ4: Multiplex link prediction 

In our second prediction task, we are interested in evaluating 
the performance of each feature set in predicting link multi- 
plexity. Given a feature vector xy, we would like to predict 
a target label where a link exists on both 

layers (+1) or none (-1). We compare performance of the 
multilayer features to the Twitter and Foursquare sets. 

In this task, we use all three feature sets to predict mul¬ 
tiplex links, which generally exhibit signs of a stronger on¬ 
line bond through interaction and structural properties as we 
have seen in the first part of this work. In Figures and 
6d we observe how Twitter and Foursquare features per¬ 
form in predicting multiplex links using the Random Forest 
algorithm again, with the highest AUC scores of 0.82 and 
0.84 for each set respectively. The Foursquare feature set 
performs better in terms of AUC scores but the multilayer 
feature set outperforms both (AUC = 0.88 for Chicago), due 
to its inclusion of features from different layers and cross¬ 
layer structural properties. 

In conclusion, it is possible to predict links between het¬ 
erogeneous social networks and to predict multiplex links 
spanning multiple networks using multilayer features as we 
have seen in our subset of users. We discuss the applications 
of these results in the following section. 


Discussion & Conclusions 

In this work we have demonstrated the structural and inter¬ 
action properties of links across two online social networks 
and have also shown the value of multilayer features in pre¬ 
dicting links on both Twitter and Foursquare, and multiplex 
links. We believe that the primary contribution is method¬ 
ological, since it provides a novel framework for investigat¬ 
ing multiplexity across different social networks. The tech¬ 
niques discussed in this work are general and can be poten¬ 
tially used to investigate other scenarios for which datasets 
containing information about social interactions across mul¬ 
tiple networks are available. In this section, we discuss 
the implications, limitations and real-world applications of 
these results. 


Implications 

Recently, social media has been increasingly alluded to as 
an ecosystem. The parallel comes after the emergence of 
multiple OSNs, interacting as a system, while competing for 
the same resources - users and their attention. We have ad¬ 
dressed this system aspect by modelling multiple social net¬ 
works as a multilayer online social network in this work. We 
have also identified two extensions of the node neighbour¬ 
hood. The global neighbourhood or degree gives insight into 


a users’ full connectivity across services, this is especially 
important when considering users with asymmetric activity 
and degree across networks since their centrality in the on¬ 
line ecosystem can be under or over-estimated. We addition¬ 
ally defined the core degree, which on the other hand reveals 
the intersection across networks, and therefore the stronger 
online ties - those relevant on multiple networks. 

The strength of ties manifested through multiplex¬ 
ity is expressed through a greater intensity of interac¬ 
tions and greater similarity across attributes both the of- 


fline ([Haythomthwaite 2005] Hristova, Musolesi, and Mas- 
|colo 2014| l, and in the online context as we have seen in this 
work. We have introduced a number of features, which take 
into consideration the multilayer neighbourhood of users 
in OSNs. The Adamic/Adar coefficient of neighbourhood 
similarity in its core neighbourhood version proved to be a 
strong indicator of multiplex ties. Additionally, we intro¬ 
duced combined features, such as the global interaction and 
similarity over distance, which reflect more distinctively the 
type of link, which exists between two users, than its single¬ 
layer counterparts. These features can be applied across 
multiple networks and can be flexible in their construction 
according to the context of the OSNs under consideration. 

Limitations 

Media multiplexity is fascinating from the social networks 
perspective as it can reveal the strength and nature of a so¬ 
cial tie given the full communication profile of people across 
all media they use ( |Haythornthwaite 2005 1. Unfortunately, 
full online and offline communication profiles of individuals 
were not available and our analysis is limited to two social 
networks. Nevertheless, we have observed some evidence 
of media multiplexity manifested in the greater intensity and 
structural overlap of multiplex links and have gained insight 
into how we can utilise these properties for link prediction. 
Certainly, considering more OSNs and further relating me¬ 
dia multiplexity to its offline manifestation is one of our fu¬ 
ture goals, and we believe that with the further integration 
of social media services and availability of data this will be 
possible in the near future. 

Our data is limited to a sub-sample of users who we know 
have active accounts on both networks in three US cities. 
Foursquare check-ins also being limited to those posted on 
Twitter. This excludes a number of users who may have 
Foursquare accounts but have not linked them on Twitter. 
Nevertheless, we were able to show that it is possible to 
predict one social network from the other in a cross-network 
manner and we hope to extend our prediction and analysis 
to a greater scale and geographical scope in the future. 


Applications 

Most new OSNs use contact list integration with external ex¬ 
isting networks, such as copying friendships from Facebook 
through the open graph protocol^ Copying links from pre¬ 
existing social networks to new ones results in higher social 

^https://developers.facebook.com/docs/ 
opengraph 












interaction between copied links than between links created 
natively in the platform ( |Zhong et al. 2014) l. We propose 
that extending this copied network with a rank of relevance 
of contacts using multiplexity can provide even further ben¬ 
efits for newly launched services. 

In addition to fostering multiplexity, however, new OSNs 
and especially interest-driven ones such as Pinterest for ex¬ 
ample, may benefit from similarity-based friend recommen¬ 
dations. In this work, we apply mobility features and neigh¬ 
bourhood similarity from Foursquare to predict links on 
Twitter and vice versa, highlighting the relationship between 
similar users across heterogeneous platforms. Similarly 
in ([Tang, Lou, and Kleinberg 2012|l, the authors infer types 
of relationships across different domains such as mobile and 
co-author networks. Although using a transfer knowledge 
framework, and not exogenous interaction features like we 
do, the authors also agree that integrating social theory in 
the prediction framework can greatly improve results. The 
present work is a step towards understanding the compos¬ 
ite nature of online social network services and hopefully 
towards enhancing their functionality and purpose. 
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